-
-
Notifications
You must be signed in to change notification settings - Fork 395
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve ContainExactly matcher speed when elements obey transitivity #1325
Improve ContainExactly matcher speed when elements obey transitivity #1325
Conversation
This is a small proof of concept to suggest a path forward for improving the performance of |
c00b44c
to
57ccbff
Compare
@pirj Had to update a few things to pass CI for all ruby versions. If you think it's reasonable, I can try to put out a more comprehensive PR. What do you think of this approach? |
Not to litter in contained threads, I'll just post my findings here. I have a limited experience with practical algorithm application, unfortunately. https://en.wikipedia.org/wiki/Stable_marriage_problem
https://en.wikipedia.org/wiki/Lattice_of_stable_matchings https://brilliant.org/wiki/matching/
https://brilliant.org/wiki/matching-algorithms/ In any case, O(n!) seems to be ridiculously expensive. Building a lattice is O(N^2), and shaking it has a chance to be less computationally extensive, but I suppose will consume more memory. Does the practical task boil down to finding the local minimum total of extra and missing elements? |
b78dd66
to
42dda47
Compare
👋 @pirj, I'm gonna reply in-line just to keep a summary of where we're at on this PR. You had two recent bits of feedback that I've addressed below:
I think there are a few ways to address your concerns above: 1. Better naming: I'm not wedded to transitive. We could call it
Awesome, I think we're on the same page (and thanks for being so helpful in your responses). I've just refactored my code to calculate extra and missing items in O(n) time when transitive is used. This addresses your earlier concern about I think it'd be a really nice improvement to When you get a chance (you've already helped tons, thank you!), would you mind letting me know if you'd be open to merging this functionality into RSpec? |
Just skimmed through your comments and I truly like the idea to detect if all items are transitive by making a check if the sort succeeded. Wondering if |
710aee6
to
5cc1ea7
Compare
…ents obey transitivity This is a proof of concept approach for addressing issue rspec#1161. The current implementation for ContainExactly runs in O(n!). In practice, it runs in O(n log n) when the elements are comparable and sorting result in a match. The crux of the problem is that some elements don't obey transitivity. As a result, knowing that sorting actual and expected doesn't result in a match *doesn't* guarantee that expected and actual don't match. This proof of concept provides a way for the user to indicate that the elements in a particular example's expected and actual obey transitivity. That looks like this: expect(a).to contain_exactly(*b).transitive And runs in O(n log n) time. More practically, this means that common use cases for contains_exactly will enjoy a massive speedup. Previously, users have examples where comparing arrays of 30 integers "never finishes." Using `.transitive` here with arrays of 10,000 integers runs in < 0.1s on my machine.
5cc1ea7
to
0dae38c
Compare
I dared to push some cosmetic changes, what do you think of such an auto-detection of transitivity? |
Thanks for doing that! I think this is great and a clear improvement over If you're happy with this, would you mind approving? If so, I'll update the PR title and message since it's no longer a proof of concept and doesn't use |
I'm all good with the change. The only thing I think needs addressing are specs - there is a very similar example with 10 numbers and Timeout check. It makes sense to combine or at least put those two closer to each other.
@JonRowe @benoittgt wdyt?
|
@pirj Yeah, that makes a lot of sense. I just refactored the tests so that my speed tests are near the 10 number + Timeout check. Cleaned a few things up with shared examples and How does it look? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perfect! It's a huge win for this use case.
Thanks a lot for the contribution!
Here's a different idea for the implementation. What if in the This implementation might be considered more straightforward because it doesn't introduce a new branches to the code flow, but optimizes the existing implementation. Here's a prototype: #1328 |
Speed up the ContainExactly matcher by pre-emptively matching up corresponding elements in the expected and actual arrays. This addresses rspec#1006, rspec#1161. This PR is a collaboration between me and @genehsu based on a couple of our earlier PRs and discussion that resulted: 1) rspec#1325 2) rspec#1328 Co-authored-by: Gene Hsu (@genehsu)
Speed up the ContainExactly matcher by pre-emptively matching up corresponding elements in the expected and actual arrays. This addresses rspec#1006, rspec#1161. This PR is a collaboration between me and @genehsu based on a couple of our earlier PRs and discussion that resulted: 1) rspec#1325 2) rspec#1328 Co-authored-by: Gene Hsu (@genehsu)
Speed up the ContainExactly matcher by pre-emptively matching up corresponding elements in the expected and actual arrays. This addresses rspec#1006, rspec#1161. This PR is a collaboration between me and @genehsu based on a couple of our earlier PRs and discussion that resulted: 1) rspec#1325 2) rspec#1328 Co-authored-by: Gene Hsu (@genehsu)
Closing in favor of #1333 |
This addresses issues #1006, #1161.
The current implementation for
ContainExactly
runs in O(n!). The crux of the problem is that some elements don't obey transitivity. As a result, knowing that sorting actual and expected doesn't result in a match doesn't guarantee that expected and actual don't match.This PR makes improvements that ensure any test with comparable elements in
actual
andexpected
runs in O(n log n) time.There are two main updates:
Update the core logic in
ContainExactly
to:actual
andexpected
This means that when
actual
andexpected
contain comparable elements, we can immediately returnfalse
when the sorted arrays don't match. This speeds up passing tests whereactual
andexpected
don't match from O(n!) to O(n log n).Speed up determining
extra
andmissing
elementsPreviously, the logic for determining
extra
andmissing
items betweenactual
andexpected
relied on code inPairingsMaximizer
that generated all possible element pairings. So even if you avoided comparing all possible pairings to determine if the matcher should return true, you would still incur that O(n!) cost when generating a failure message.The code now calculates
extra
andmissing
in linear time. This speeds up failing tests from O(n!) to O(n log n).More practically, this means that common use cases for
ContainExactly
will enjoy a massive speedup. Previously, users have examples where comparing arrays of 30 integers "never finishes." With this PR's update, comparing arrays of 10,000 integers runs in < 0.1s on my machine.